2,058 research outputs found

    Evolving concurrent Petri net models of epistasis

    Get PDF
    A genetic algorithm is used to learn a non-deterministic Petri netbased model of non-linear gene interactions, or statistical epistasis. Petri nets are computational models of concurrent processes. However, often certain global assumptions (e.g. transition priorities) are required in order to convert a non-deterministic Petri net into a simpler deterministic model for easier analysis and evaluation. We show, by converting a Petri net into a set of state trees, that it is possible to both retain Petri net non-determinism (i.e. allowing local interactions only, thereby making the model more realistic), whilst also learning useful Petri nets with practical applications. Our Petri nets produce predictions of genetic disease risk assessments derived from clinical data that match with over 92% accuracy

    Modelling epistasis in genetic disease using Petri nets, evolutionary computation and frequent itemset mining

    Get PDF
    Petri nets are useful for mathematically modelling disease-causing genetic epistasis. A Petri net model of an interaction has the potential to lead to biological insight into the cause of a genetic disease. However, defining a Petri net by hand for a particular interaction is extremely difficult because of the sheer complexity of the problem and degrees of freedom inherent in a Petri net’s architecture. We propose therefore a novel method, based on evolutionary computation and data mining, for automatically constructing Petri net models of non-linear gene interactions. The method comprises two main steps. Firstly, an initial partial Petri net is set up with several repeated sub-nets that model individual genes and a set of constraints, comprising relevant common sense and biological knowledge, is also defined. These constraints characterise the class of Petri nets that are desired. Secondly, this initial Petri net structure and the constraints are used as the input to a genetic algorithm. The genetic algorithm searches for a Petri net architecture that is both a superset of the initial net, and also conforms to all of the given constraints. The genetic algorithm evaluation function that we employ gives equal weighting to both the accuracy of the net and also its parsimony. We demonstrate our method using an epistatic model related to the presence of digital ulcers in systemic sclerosis patients that was recently reported in the literature. Our results show that although individual “perfect” Petri nets can frequently be discovered for this interaction, the true value of this approach lies in generating many different perfect nets, and applying data mining techniques to them in order to elucidate common and statistically significant patterns of interaction

    Extension of the survival dimensionality reduction algorithm to detect epistasis in competing risks models (SDR-CR)

    Get PDF
    AbstractBackgroundThe discovery and the description of the genetic background of common human diseases is hampered by their complexity and dynamic behavior. Appropriate bioinformatic tools are needed to account all the facets of complex diseases and to this end we recently described the survival dimensionality reduction (SDR) algorithm in the effort to model gene–gene interactions in the context of survival analysis. When one event precludes the occurrence of another event under investigation in the ‘competing risk model’, survival algorithms require particular adjustment to avoid the risk of reporting wrong or biased conclusions.MethodsThe SDR algorithm was modified to incorporate the cumulative incidence function as well as an adapted version of the Brier score for mutually exclusive outcomes, to better search for epistatic models in the competing risk setting. The applicability of the new SDR algorithm (SDR-CR) was evaluated using synthetic lifetime epistatic datasets with competing risks and on a dataset of scleroderma patients.Results/conclusionsThe SDR-CR algorithms retains a satisfactory power to detect the causative variants in simulated datasets under different scenarios of sample size and degrees of type I or type II censoring. In the real-world dataset, SDR-CR was capable of detecting a significant interaction between the IL-1α C-889T and the IL-1β C-511T single-nucleotide polymorphisms to predict the occurrence of restrictive lung disease vs. isolated pulmonary hypertension.We provide an useful extension of the SDR algorithm to analyze epistatic interactions in the competing risk settings that may be of use to unveil the genetic background of complex human diseases. Availability: http://sourceforge.net/projects/sdrproject/files/

    Abatacept to treat chronic intestinal pseudo-obstruction in five systemic sclerosis patients with a description of the index case:

    Get PDF
    Chronic intestinal pseudo-obstruction is a severe complication of systemic sclerosis. Inflammatory neuropathy and immunological alterations have a prominent role in the development of systemic scle..

    Online Packing to Minimize Area or Perimeter

    Get PDF
    We consider online packing problems where we get a stream of axis-parallel rectangles. The rectangles have to be placed in the plane without overlapping, and each rectangle must be placed without knowing the subsequent rectangles. The goal is to minimize the perimeter or the area of the axis-parallel bounding box of the rectangles. We either allow rotations by 90^? or translations only. For the perimeter version we give algorithms with an absolute competitive ratio slightly less than 4 when only translations are allowed and when rotations are also allowed. We then turn our attention to minimizing the area and show that the competitive ratio of any algorithm is at least ?(?n), where n is the number of rectangles in the stream, and this holds with and without rotations. We then present algorithms that match this bound in both cases and the competitive ratio is thus optimal to within a constant factor. We also show that the competitive ratio cannot be bounded as a function of Opt. We then consider two special cases. The first is when all the given rectangles have aspect ratios bounded by some constant. The particular variant where all the rectangles are squares and we want to minimize the area of the bounding square has been studied before and an algorithm with a competitive ratio of 8 has been given [Fekete and Hoffmann, Algorithmica, 2017]. We improve the analysis of the algorithm and show that the ratio is at most 6, which is tight. The second special case is when all edges have length at least 1. Here, the ?(?n) lower bound still holds, and we turn our attention to lower bounds depending on Opt. We show that any algorithm for the translational case has a competitive ratio of at least ?(?{Opt}). If rotations are allowed, we show a lower bound of ?(?{Opt}). For both versions, we give algorithms that match the respective lower bounds: With translations only, this is just the algorithm from the general case with competitive ratio O(?n) = O(?{Opt}). If rotations are allowed, we give an algorithm with competitive ratio O(min{?n,?{Opt}}), thus matching both lower bounds simultaneously

    Approximate Earth Mover's Distance in Truly-Subquadratic Time

    Full text link
    We design an additive approximation scheme for estimating the cost of the min-weight bipartite matching problem: given a bipartite graph with non-negative edge costs and ε>0\varepsilon > 0, our algorithm estimates the cost of matching all but O(ε)O(\varepsilon)-fraction of the vertices in truly subquadratic time O(n2−δ(ε))O(n^{2-\delta(\varepsilon)}). Our algorithm has a natural interpretation for computing the Earth Mover's Distance (EMD), up to a ε\varepsilon-additive approximation. Notably, we make no assumptions about the underlying metric (more generally, the costs do not have to satisfy triangle inequality). Note that compared to the size of the instance (an arbitrary n×nn \times n cost matrix), our algorithm runs in {\em sublinear} time. Our algorithm can approximate a slightly more general problem: max-cardinality bipartite matching with a knapsack constraint, where the goal is to maximize the number of vertices that can be matched up to a total cost BB

    optical method to measure mesh tensioning

    Get PDF
    Abstract The present paper presents a method to estimate the tensional status of a knitted mesh. To reach this result, the relationship between the frequencies of vibration, recorded by a high-sampling camera and analysed through image processing, and different tensioning on the mesh itself, has been investigated. After having conducted several tests, all the collected pairs frequency-tensional status have been used to extrapolate an optimal (in a least-squares sense) correlation between frequency of vibration and tension of the mesh

    Multi-Swap kk-Means++

    Full text link
    The kk-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular kk-means clustering objective and is known to give an O(log⁥k)O(\log k)-approximation in expectation. To obtain higher quality solutions, Lattanzi and Sohler (ICML 2019) proposed augmenting kk-means++ with O(klog⁥log⁥k)O(k \log \log k) local search steps obtained through the kk-means++ sampling distribution to yield a cc-approximation to the kk-means clustering problem, where cc is a large absolute constant. Here we generalize and extend their local search algorithm by considering larger and more sophisticated local search neighborhoods hence allowing to swap multiple centers at the same time. Our algorithm achieves a 9+ξ9 + \varepsilon approximation ratio, which is the best possible for local search. Importantly we show that our approach yields substantial practical improvements, we show significant quality improvements over the approach of Lattanzi and Sohler (ICML 2019) on several datasets.Comment: NeurIPS 202

    Locally Uniform Hashing

    Full text link
    Hashing is a common technique used in data processing, with a strong impact on the time and resources spent on computation. Hashing also affects the applicability of theoretical results that often assume access to (unrealistic) uniform/fully-random hash functions. In this paper, we are concerned with designing hash functions that are practical and come with strong theoretical guarantees on their performance. To this end, we present tornado tabulation hashing, which is simple, fast, and exhibits a certain full, local randomness property that provably makes diverse algorithms perform almost as if (abstract) fully-random hashing was used. For example, this includes classic linear probing, the widely used HyperLogLog algorithm of Flajolet, Fusy, Gandouet, Meunier [AOFA 97] for counting distinct elements, and the one-permutation hashing of Li, Owen, and Zhang [NIPS 12] for large-scale machine learning. We also provide a very efficient solution for the classical problem of obtaining fully-random hashing on a fixed (but unknown to the hash function) set of nn keys using O(n)O(n) space. As a consequence, we get more efficient implementations of the splitting trick of Dietzfelbinger and Rink [ICALP'09] and the succinct space uniform hashing of Pagh and Pagh [SICOMP'08]. Tornado tabulation hashing is based on a simple method to systematically break dependencies in tabulation-based hashing techniques.Comment: FOCS 202
    • …
    corecore